The Design of Syntactic Annotation Levels in the National Corpus of Polish

نویسندگان

  • Katarzyna Glowinska
  • Adam Przepiórkowski
چکیده

This paper presents the procedure of the syntactic annotation of the National Corpus of Polish. Syntactic annotation consists here of shallow parsing and manual post-editing of the results by annotators. The description concentrates on the delimitation of syntactic words and groups, as well as on problems encountered during the annotation process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Recent Developments in the National Corpus of Polish

The aim of the paper is to present recent — as of July 2009 — developments in the construction of the National Corpus of Polish. The main developments are: 1) the design of text encoding XML schemata for various levels of linguistic information, 2) a new tool for manual annotation at various levels, 3) numerous improvements in search tools.

متن کامل

Syntactic processing of the IPI PAN Corpus of Polish

The aim of this paper is to present recent and ongoing work on adorning the IPI PAN Corpus of Polish (Przepiórkowski 2004, 2006a) with partial syntactic annotation, with the ultimate aim of building a treebank of Polish. The work described here is a part of the project Automatic extraction of linguistic knowledge from a large corpus of Polish (a Ministry of Education and Science grant number 3T...

متن کامل

On Heads and Coordination in Valence Acquisition

The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the corresponding extension of the corpus search engine Poliqarp [25,12] developed at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, ...

متن کامل

Lexicons and Grammars for Named Entity Annotation in the National Corpus of Polish

We present initial results in the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements de ned for this corpus, and we discuss how existing lexical resources and grammars for Polish named entities have been adapted to meet those requirements. We show rst results of the corpus annotation using the information extra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010